What We'll Cover
Everything covered in this week so far has followed the same basic pattern: you ask the AI something, it produces code or analysis, you check it and decide what to do next. This conversational loop is genuinely useful — but it is no longer the frontier of what AI can do for data-intensive research.
This session introduces a different paradigm: agentic AI, where instead of asking the AI questions one at a time, you give it tools, a goal, and context — and let it plan and execute a multi-step analysis pipeline largely on its own. The AI reads your actual files, runs code, checks the results, fixes problems, and iterates, without you feeding it information piece by piece.
This is not science fiction. It is deployed and available today. But it introduces a new set of considerations — around setup, oversight, reproducibility, and trust — that are quite different from conversational AI use. By the end of this session you will understand what agentic analysis actually means, how to set up a project for it, and — crucially — where it still needs you.
🔀 The Paradigm Shift: From Conversation to Agency
The distinction between a conversational AI tool and an agentic one is not just about speed or capability — it is a fundamentally different model of how the AI participates in your work.
Conversational AI (What You've Used So Far)
You are the one who runs code, checks results, and decides what to ask next. The AI responds to each message in isolation, without access to your actual files or the ability to act on its own suggestions.
- You paste data samples or error messages into the chat
- AI generates a code snippet you copy and run yourself
- You read the output, decide if it's right, and ask the next question
- The AI has no memory between sessions and no access to your file system
- You are the connection between the AI's suggestions and your actual environment
Agentic AI (The New Paradigm)
The AI operates in your environment, with access to your files, the ability to run code, and the capacity to plan and execute multi-step tasks autonomously — checking its own work and adjusting as it goes.
- The AI reads your actual data files directly — no copy-pasting
- It runs code, sees the output, and decides what to do next
- It can fix its own errors, iterate on results, and pursue a goal across many steps
- It maintains context across a project through structured documentation (CLAUDE.md)
- You set the goal and the constraints; the agent handles the execution
💡 The USB-C Analogy
The key enabling technology here is the Model Context Protocol (MCP), announced by Anthropic in November 2024 and now an open industry standard supported by Anthropic, OpenAI, and Google. MCP has been described as "USB-C for AI" — a single universal protocol that lets AI models connect to any external system: databases, file systems, APIs, web browsers, version control systems.
Before MCP, connecting an AI to an external data source required bespoke integration work for every combination of AI and data system. Now, with a standard protocol and over 8,600 available MCP servers, you can give an AI direct, secure access to a PostgreSQL database, a file system, or a cloud storage bucket using off-the-shelf components. This is what makes genuine agentic research workflows practical rather than theoretical.
⚙️ What Agentic Tools Can Actually Do in 2026
The capability of current agentic tools has expanded significantly in the past 18 months. Here is an honest picture of what is actually possible today — based on what tools like Claude Code, Codex, and Gemini CLI can do in practice.
Claude Code: The Current Capability Ceiling
Claude Code is currently the most capable agentic coding tool for research use. In practical terms, it can:
- Read entire project structures: It understands multi-file dependencies, not just individual snippets. You can point it at a project directory and it will orient itself before writing a single line.
- Execute and iterate autonomously: It runs code, sees the results, fixes errors, and continues — without you prompting it after each step. As of early 2026, median task durations have reached over 45 minutes, with some sessions running autonomously for hours.
- Spawn parallel sub-agents: For complex multi-step analyses, it can launch independent agents to work on different tasks simultaneously — one cleaning data while another generates exploratory plots.
- View its own visual outputs: As discussed in Sub-Lesson 3, it can inspect plots it generates and critique them for quality and accessibility issues.
- Connect to external data sources via MCP: With the right MCP server configured, it can query a database, read cloud storage, or call an API directly, without you manually exporting data first.
Also in the Ecosystem
Codex (OpenAI) — Claude Code's primary commercial competitor, launched as a dedicated agent in April 2025. Terminal-based, file-reading, open-source CLI. Tightly integrated with the OpenAI ecosystem.
Gemini CLI (Google) — Open-source terminal agent released January 2026. Strong Google Cloud integration; supports MCP. Useful if your data lives in Google infrastructure.
Open Interpreter — Open-source, runs fully locally. Critical distinction: your data never leaves your machine, making it relevant for sensitive research data that cannot be sent to a cloud provider.
The Research Frontier
The most ambitious demonstration of agentic research capability is the Sakana AI Scientist. Version 2 (2025) automates the entire research lifecycle: idea generation, code writing, experiment execution, result analysis, and paper writing — at approximately $15 per paper.
One AI Scientist paper achieved peer-review acceptance at an ICLR 2025 workshop; the system description was published in Nature. A 2026 Nature editorial noted that "AI scientists are changing research — institutions must respond." This is not an endorsement — critical evaluations identify serious weaknesses in literature synthesis and novelty assessment — but it marks the direction of travel.
🔌 MCP: Connecting AI to Your Research Data
For most researchers, the most practically valuable aspect of agentic tools is direct data access via MCP. Instead of exporting a CSV, pasting a sample into a chat window, and hoping the AI correctly infers your data structure, you give the agent a connection to your actual data — and it reads, queries, and works with it directly.
What MCP Actually Connects
MCP servers expose three types of capability to an AI agent:
- Resources — structured data injected directly into the model's context: file contents, database records, API responses
- Tools — functions the agent can call to retrieve or act on data: run a SQL query, write a file, search a codebase
- Prompts — reusable templates for structured interactions with specific data systems
For research workflows, the most useful MCP servers are:
| Category | MCP Servers | Research Use Case |
|---|---|---|
| Filesystem | Official Filesystem server | Agent reads your data files, writes output, navigates project structure directly |
| Databases | PostgreSQL, SQLite, MySQL, MongoDB | Agent queries your research database, runs analysis directly on stored data |
| Cloud storage | Google Cloud (BigQuery, Cloud SQL) | Large datasets in cloud infrastructure; agent runs queries without downloading |
| Version control | Git server | Agent reads code history, understands project evolution, makes commits |
| Semantic search | Chroma MCP | Vector database for finding relevant documents or past analyses |
📌 You Do Not Need to Understand MCP Deeply to Use It
MCP configuration typically involves adding a few lines to a JSON settings file that tell Claude Code which servers to connect to and with what permissions. For filesystem access — the most common research use case — the setup takes about five minutes. The official MCP server repository at github.com/modelcontextprotocol/servers provides ready-made configurations for all major data systems. You do not need to build or maintain anything — you are connecting to existing infrastructure.
📄 CLAUDE.md: Onboarding the Agent to Your Project
When you start a new Claude Code session, the agent knows nothing about your project. CLAUDE.md is how you fix that. It is a plain Markdown file in your project root that Claude Code reads automatically at the start of every session — think of it as the onboarding document you would give a new research assistant joining your project on their first day.
📋 What Goes in a Research CLAUDE.md
Running /init in Claude Code will auto-generate a starter file based on what it finds in your project. But for a research data analysis project, you want to customise it carefully. A well-structured research CLAUDE.md typically includes:
Notice what this achieves: every statistical convention you care about, every data-handling rule, and every constraint on what the agent should not do is specified upfront. The agent applies these throughout the session — you do not need to repeat them in every prompt.
⚠️ CLAUDE.md Is Not Magic
The agent reads CLAUDE.md but it does not always follow every instruction perfectly, especially in long or complex sessions. Think of it as the difference between telling a new research assistant your conventions on their first day versus expecting them to remember everything perfectly three months later. It significantly improves consistency, but it does not replace oversight. Keep your CLAUDE.md under 200 lines — it consumes context window space, and more instructions do not always mean better adherence.
🏃 Running an Agentic Analysis Pipeline
What does it actually look like when you hand an analysis task to an agent rather than conducting it conversationally? Here is a concrete research scenario.
🔬 A Real Agentic Workflow: From Raw Data to Annotated Results
You have six months of longitudinal survey data across three sites, pre-cleaned and stored as CSV files. Your research question is whether a teaching intervention improved outcomes differently across sites. Instead of a back-and-forth conversation, you write one goal statement to Claude Code:
"Load the three wave CSV files in /data/processed/, merge them on participant_id, run our pre-registered mixed-effects model (intervention × site × time), produce the three figures from our analysis plan, and save a summary table in /output/. Flag anything unexpected in the data. Use the conventions in CLAUDE.md throughout."
The agent then:
- Reads the CLAUDE.md to understand your project context, conventions, and constraints
- Loads all three CSV files, checks that participant IDs merge cleanly, and flags the 12 cases where they do not — rather than silently dropping them
- Runs the mixed-effects model using the specified library, checks convergence warnings, and tries an alternative specification when the first does not converge cleanly
- Produces the three figures in the specified format, checks them for colourblind safety, and exports them as PDF to /output/figures/
- Writes a summary table and a short plain-language interpretation of the key findings to /output/results_summary.md
- Writes a log of every decision it made — which merge strategy it used, how it handled the 12 mismatched IDs, what changed between the first and second model specification
This is not hypothetical — this is what current tools can do with a well-structured project and a good CLAUDE.md. What it does not do is tell you whether the findings are scientifically meaningful, whether the model specification is appropriate for your theory, or whether the "unexpected patterns" it flagged matter for your interpretation. That is still yours.
💡 The Goal Statement Is Everything
The quality of an agentic analysis depends heavily on the quality of the goal statement you give it upfront. Vague goals (“analyse my data”) produce vague results. Specific, bounded goals (“run the pre-registered model, produce these specific outputs, flag anomalies, log all decisions”) produce useful results. The goal statement is your primary lever — not the prompts you send during execution, because for much of a well-specified task, you will not be sending prompts at all.
👀 Oversight: The Non-Negotiable Part
Agentic tools introduce a specific oversight challenge that does not exist in conversational use: the agent can take many consequential actions between your check-ins. Getting oversight right is the central governance problem of agentic research workflows.
The Reliability Mathematics
Here is a sobering calculation. Suppose each step in an agentic pipeline succeeds with 90% reliability — a reasonable figure for current tools on well-defined tasks. For a 5-step pipeline:
0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 59% end-to-end reliability
For a 10-step pipeline: 35%. For 20 steps: 12%.
This is a structural property of sequential processes — it does not matter how capable the individual steps are. It is why long autonomous agentic pipelines need human checkpoints, not because AI tools are unreliable, but because compounding probabilities are mathematics.
✓ What Good Oversight Looks Like
- Review the plan before execution: Ask the agent to outline its intended steps before it starts. If the plan is wrong, catching it here is infinitely cheaper than catching it after 30 minutes of autonomous work
- Set explicit checkpoints: Instruct the agent to pause and report at key decision points — after data loading, after cleaning, before running the final model
- Require a decision log: Every choice the agent makes (how to handle missing data, which column it used for grouping) should be logged so you can audit it
- Verify outputs, not just completion: The agent finishing is not the same as the agent being right. Check key results against known values or sanity checks
- Start with a small scope: On a new dataset, run the agent on a 10% sample first. Check the results thoroughly before running it on the full data
✗ What Goes Wrong Without It
- Silent scope creep: Agents given broad access sometimes take actions beyond their intended task — modifying files you expected to be read-only, or running analyses you did not ask for
- Compounding errors: An error in step 2 that goes unchecked propagates through every subsequent step. The final output looks complete but is built on a flawed foundation
- Reproducibility loss: Long agentic sessions can be non-deterministic. If you cannot reconstruct exactly what the agent did and why, you cannot defend your methodology in a paper or reproduce your own results
- Autonomous hypothesis-chasing: A Nature editorial raised the concern that agentic systems that iterate analysis until they find significant results could automate p-hacking at a scale that "overloads publishing peer-review systems without shifting the needle on discovery"
⚠️ Security: A Genuine Concern for MCP
When you connect an AI agent to real institutional data systems via MCP, you introduce security considerations that do not arise in conversational use. Palo Alto Networks' Unit 42 has documented MCP-specific attack vectors including tool poisoning (malicious instructions hidden in tool metadata) and credential theft via indirect prompt injection. For researchers connecting agents to institutional databases, applying the principle of minimum necessary access is not optional — give the agent access to the specific data it needs for the task, not to your entire institutional data infrastructure. Check with your institution's IT security team before connecting research data systems to external MCP servers.
⚖️ When to Use Agentic vs. Conversational
Agentic tools are not simply "better" than conversational ones. They are suited to different kinds of tasks. Choosing between them is a judgment call you will make regularly as AI becomes more capable.
A Decision Framework
| Use Agentic When… | Use Conversational When… | |
|---|---|---|
| Task structure | You can specify the goal clearly upfront and the steps are mostly implementation | The direction depends on what you find — each result changes what you ask next |
| Domain judgment | The required judgments are ones you can specify in advance (e.g., "use Mann-Whitney for Likert data") | Significant domain judgment is required at each step that you cannot specify in advance |
| Scale | Large datasets, many files, or multi-step pipelines where copy-pasting is impractical | Small datasets, single questions, learning or exploring what the data contains |
| Reproducibility | You have set up logging and can audit the agent's decisions after the fact | The conversational history itself serves as the record of your analytical decisions |
| Data sensitivity | You have reviewed access permissions and the data is appropriate for cloud processing, or you are using a local-only tool | You are working with samples or synthetic data and not transmitting sensitive information |
📌 The Honest Assessment
As of 2026, most researchers using agentic tools for the first time find that the setup cost — structuring a project correctly, writing a good CLAUDE.md, configuring MCP — takes longer than they expected, and the first few autonomous runs require significant review and correction. The payoff comes on the second, third, and tenth use of a well-configured project, when the same setup generates consistent, auditable results with much less manual effort. Agentic tools reward investment in project structure in a way that conversational tools do not.
📚 Readings for This Session
📖 Core Reading 1
Anthropic (2024). "Introducing the Model Context Protocol." — The original announcement of MCP, explaining the motivation, architecture, and vision. Short and accessible — this is the primary source for understanding what MCP is and why it was created. Read alongside the CLAUDE.md best practices guide from Anthropic's blog.
📖 Core Reading 2
Zardiashvili et al. (2025). "Risks of AI scientists: prioritizing safeguarding over autonomy." Nature Communications. — A rigorous analysis of the specific risks that arise when AI agents are used for autonomous scientific research. Proposes a triadic safeguarding framework (human regulation, agent alignment, environmental feedback) as the key pillars of responsible agentic research deployment. Essential reading before deploying agents on real research data.
📖 Core Reading 3
Nature (2026). "AI scientists are changing research — institutions must respond." — A policy-oriented editorial on the institutional implications of agentic research tools. Covers reproducibility challenges, peer-review implications, and what research institutions need to put in place. Good for understanding the broader context in which you will be using these tools during your career.
📄 Supplementary: Practical Guides
Claude Code Documentation — The official guide to Claude Code, including sub-agents, CLAUDE.md best practices, MCP configuration, and multi-agent workflows. Best practices section is particularly relevant.
Agentic AI for Scientific Discovery: A Survey (arXiv:2503.08979, 2025) — A comprehensive survey of where agentic AI is currently deployed in scientific research, what works, and where the significant gaps remain. Good context for understanding which research domains are furthest ahead.
Key Takeaways
The paradigm has shifted from conversation to agency. Giving an AI tools, context, and a goal — and letting it execute a multi-step pipeline autonomously — is now practical for research workflows. MCP, Claude Code, Codex, and Gemini CLI make this accessible without deep technical expertise.
CLAUDE.md is the most important document you will write for an agentic project. It specifies your data structure, analytical conventions, and constraints upfront, so the agent applies them throughout a session without you repeating them. Investing 30 minutes in a good CLAUDE.md saves hours of downstream correction.
Oversight is mathematics, not distrust. The reliability degradation in sequential pipelines means checkpoints are not optional — they are structural requirements for any serious multi-step agentic analysis. Review the plan before execution, require decision logs, and verify outputs, not just completion.
Agentic tools reward project structure. The setup cost is real, but it pays off across repeated use. The researchers who benefit most from agentic tools are those who treat project organisation as a first-class research activity, not an afterthought.
The judgment is still yours. Agentic tools can execute a pre-specified analysis faster and more consistently than you can by hand. They cannot decide what question to ask, whether a result is scientifically meaningful, or how to handle a finding that contradicts your expectations. The frontier has moved, but the researcher has not been replaced — the researcher has been given a more capable collaborator.
Next session: Sub-Lesson 7 brings together the full week with hands-on activities. You will practice both conversational and agentic workflows on real data, compare the results, and complete the weekly assessment.